Instance Pruning Techniques

نویسندگان

D. Randall Wilson

Tony R. Martinez

چکیده

The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and slow execution speed, as well as a sensitivity to noise. This paper provides a discussion of issues related to reducing the number of instances retained in memory while maintaining (and sometimes improving) generalization accuracy, and mentions algorithms other researchers have used to address this problem. It presents three intuitive noise-tolerant algorithms that can be used to prune instances from the training set. In experiments on 29 applications, the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language model size reduction by pruning and clustering

Several techniques are known for reducing the size of language models, including count cutoffs [1], Weighted Difference pruning [2], Stolcke pruning [3], and clustering [4]. We compare all of these techniques and show some surprising results. For instance, at low pruning thresholds, Weighted Difference and Stolcke pruning underperform count cutoffs. We then show novel clustering techniques that...

متن کامل

A Double Pruning Algorithm for Classification Ensembles

This article introduces a double pruning algorithm that can be used to reduce the storage requirements, speed-up the classification process and improve the performance of parallel ensembles. A key element in the design of the algorithm is the estimation of the class label that the ensemble assigns to a given test instance by polling only a fraction of its classifiers. Instead of applying this f...

متن کامل

Optimal strategy in games with chance nodes

In this paper, games with chance nodes are analysed. The evaluation of these game trees uses the expectiminimax algorithm. We present pruning techniques involving random effects. The gamma-pruning aims at increasing the efficiency of expectiminimax (analogously to alpha-beta pruning and the classical minimax). Some interesting properties of these games are shown: for instance, a game without dr...

متن کامل

A Pruning Based Approach for Scalable Entity Coreference

Entity coreference is the process to decide which identifiers (e.g., person names, locations, ontology instances, etc.) refer to the same real world entity. In the Semantic Web, entity coreference can be used to detect equivalence relationships between heterogeneous Semantic Web datasets to explicitly link coreferent ontology instances via the owl:sameAs property. Due to the large scale of Sema...

متن کامل

Optimistic pruning for multiple instance learning

This paper introduces a simple evaluation function for multiple instance learning that admits an optimistic pruning strategy. We demonstrate comparable results to state of the art methods using significantly fewer computational resources.

متن کامل

Pruning of redundant synthesis instances based on weighted vector quantization

A new method of pruning redundant synthesis unit instances in a large-scale synthesis database was proposed based on weighted vector quantization (WVQ). WVQ takes relative importance of each instance into account when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Instance Pruning Techniques

نویسندگان

چکیده

منابع مشابه

Language model size reduction by pruning and clustering

A Double Pruning Algorithm for Classification Ensembles

Optimal strategy in games with chance nodes

A Pruning Based Approach for Scalable Entity Coreference

Optimistic pruning for multiple instance learning

Pruning of redundant synthesis instances based on weighted vector quantization

عنوان ژورنال:

اشتراک گذاری